rank function
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Information Technology (0.67)
- Banking & Finance (0.45)
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
SkipPredict: When to Invest in Predictions for Scheduling Rana Shahout Harvard University Michael Mitzenmacher Harvard University
Expanding on recent work on scheduling with predicted job sizes, we consider the effect of the cost of predictions in queueing systems, removing the assumption in prior research that predictions are external to the system's resources and/or cost-free. Additionally, we introduce a novel approach to utilizing predictions, SkipPredict, designed to address their inherent cost. Rather than uniformly applying predictions to all jobs, we propose a tailored approach that categorizes jobs to improve the effectiveness of prediction on performance. To achieve this, we employ one-bit "cheap predictions" to classify jobs as either short or long. SkipPredict prioritizes predicted short jobs over long jobs, and for the long jobs, SkipPredict applies a second round of more detailed "expensive predictions" to approximate Shortest Remaining Processing Time for these jobs.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Information Technology (0.67)
- Banking & Finance (0.45)
Efficient Differentiable Approximation of Generalized Low-rank Regularization
Li, Naiqi, Xie, Yuqiu, Liu, Peiyuan, Dai, Tao, Jiang, Yong, Xia, Shu-Tao
Low-rank regularization (LRR) has been widely applied in various machine learning tasks, but the associated optimization is challenging. Directly optimizing the rank function under constraints is NP-hard in general. To overcome this difficulty, various relaxations of the rank function were studied. However, optimization of these relaxed LRRs typically depends on singular value decomposition, which is a time-consuming and nondifferentiable operator that cannot be optimized with gradient-based techniques. To address these challenges, in this paper we propose an efficient differentiable approximation of the generalized LRR. The considered LRR form subsumes many popular choices like the nuclear norm, the Schatten-$p$ norm, and various nonconvex relaxations. Our method enables LRR terms to be appended to loss functions in a plug-and-play fashion, and the GPU-friendly operations enable efficient and convenient implementation. Furthermore, convergence analysis is presented, which rigorously shows that both the bias and the variance of our rank estimator rapidly reduce with increased sample size and iteration steps. In the experimental study, the proposed method is applied to various tasks, which demonstrates its versatility and efficiency. Code is available at https://github.com/naiqili/EDLRR.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Middle East > Israel (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Don't Stop Me Now: Embedding Based Scheduling for LLMs
Shahout, Rana, Malach, Eran, Liu, Chunwei, Jiang, Weifan, Yu, Minlan, Mitzenmacher, Michael
Efficient scheduling is crucial for interactive Large Language Model (LLM) applications, where low request completion time directly impacts user engagement. Size-based scheduling algorithms like Shortest Remaining Process Time (SRPT) aim to reduce average request completion time by leveraging known or estimated request sizes and allowing preemption by incoming jobs with shorter service times. However, two main challenges arise when applying size-based scheduling to LLM systems. First, accurately predicting output lengths from prompts is challenging and often resource-intensive, making it impractical for many systems. As a result, the state-of-the-art LLM systems default to first-come, first-served scheduling, which can lead to head-of-line blocking and reduced system efficiency. Second, preemption introduces extra memory overhead to LLM systems as they must maintain intermediate states for unfinished (preempted) requests. In this paper, we propose TRAIL, a method to obtain output predictions from the target LLM itself. After generating each output token, we recycle the embedding of its internal structure as input for a lightweight classifier that predicts the remaining length for each running request. Using these predictions, we propose a prediction-based SRPT variant with limited preemption designed to account for memory overhead in LLM systems. This variant allows preemption early in request execution when memory consumption is low but restricts preemption as requests approach completion to optimize resource utilization. On the theoretical side, we derive a closed-form formula for this SRPT variant in an M/G/1 queue model, which demonstrates its potential value. In our system, we implement this preemption policy alongside our embedding-based prediction method.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
A Multi-step Inertial Forward-Backward Splitting Method for Non-convex Optimization
We propose a multi-step inertial Forward-Backward splitting algorithm for minimizing the sum of two non-necessarily convex functions, one of which is proper lower semi-continuous while the other is differentiable with a Lipschitz continuous gradient. We first prove global convergence of the algorithm with the help of the Kurdyka-Łojasiewicz property. Then, when the non-smooth part is also partly smooth relative to a smooth submanifold, we establish finite identification of the latter and provide sharp local linear convergence analysis. The proposed method is illustrated on several problems arising from statistics and machine learning.
- North America > United States > New York (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
Deep Submodular Functions: Definitions & Learning Brian Dolhansky Dept. of Electrical Engineering
We propose and study a new class of submodular functions called deep submodular functions (DSFs). We define DSFs and situate them within the broader context of classes of submodular functions in relationship both to various matroid ranks and sums of concave composed with modular functions (SCMs). Notably, we find that DSFs constitute a strictly broader class than SCMs, thus motivating their use, but that they do not comprise all submodular functions. Interestingly, some DSFs can be seen as special cases of certain deep neural networks (DNNs), hence the name. Finally, we provide a method to learn DSFs in a max-margin framework, and offer preliminary results applying this both to synthetic and real-world data instances.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)